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If one goes backward in time, the number of ancestors of an individual doubles at each generation. This exponential 
growth very quickly exceeds the population size, when this size is finite. As a consequence, the ancestors of a given 
1— 1 individual cannot be all different and most remote ancestors are repeated many times in any genealogical tree. The 
' statistical properties of these repetitions in genealogical trees of individuals for a panmictic closed population of 
^p" 1 , constant size N can be calculated. We show that the distribution of the repetitions of ancestors reaches a stationary 
Q 1 shape after a small number G c cx log AT of generations in the past, that only about 80% of the ancestral population 
belongs to the tree (due to coalescence of branches), and that two trees for individuals in the same population become 
' . ■ identical after G c generations have elapsed. Our analysis is easy to extend to the case of exponentially growing 
^ \ population. 
• 1— 1 



I. INTRODUCTION 



In the case of sexual reproduction, the ancestry of an individual is formed by 2 parents, 4 grandparents two 
generations ago, and in general 2 G individuals G generations back into the past. The explosive growth of the number 
of ancestors belonging to the genealogical tree of, say, a present human should stop at some point due, at least, to 
^sO ' the finite size of previous populations. For instance, only G ~ 33 generations ago (spanning a period of less than 
thousand years), the number of potential ancestors in the tree of any of us is about 8.5 x 10 9 , more than the present 
population of the Earth, and of course much larger than the population living about year 1000. The answer to this 
apparent paradox is simple: The branches of a typical genealogical tree often coalesce, indicating that many of the 
ancestors were in fact relatives and appear repeatedly in the tree (Ohno, 1996; Derrida et at, 1999; Gouyon, 1999). It 
might be difficult to test the statistical properties of such repetitions for an actual large, randomly mating population. 
Nevertheless, some exceptions can be found in royal genealogy. Since nobles usually married within their own castes, 
q ■ the presence of repeated ancestors in royal genealogical trees is far from rare. The example of the English king Edward 
III, where some ancestors appear up to six times, has been analysed in our previous work (Derrida et at, 1999) J^] 

Much attention has been paid in the past to a related problem, namely the statistical properties of branching 
processes (Harris, 1963) and its applications to the characteristics of the successive descendants of a single ancestor 
(Kingman, 1993). Actually, first applications of the branchig processes technique go back to the twenties. J.B.S. 
Haldane (Haldane, 1927) calculated the probability that a mutant allele be fixed in a population through a method 
• 1— 1 , developed previously by R.A. Fisher (Fisher, 1922). There, the relevant quantity was the survival probability of the 
' descendants of the first individual carrying the mutation. All these studies apply to the vertical transmission of names, 
to the inheritance of characters coming only from one of the parents, like mithochondrial DNA or the Y chromosome, 
or to the fate of a mutant gene, for example, and correspond to an effective monoparental population. The heart of 
our problem is to take into consideration that reproduction is biparental. The distribution of repetitions of ancestors 
described below does however satisfy an equation similar to those which appear in branching processes (Harris, 1963). 

Our problem of repetitions of ancestors in genealogical trees is much closer to the counting of the descendants 
of an individual in a sexual population. For example, in the case of a population of constant size, the average 
number of offspring is two per couple. Therefore after G generations each individual has on average 2 G descendants. 



1 We used the tree of Edward III which can be found at ittp://uts. cc.utexas.edu/~churchh/edw3chrt. html 
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What prevents the number of descendants from growing exponentially with G and to exceed the population size is 
interbreeding: When 2 becomes comparable to the population size, interbreeding happens between the descendants 
and different lines of descent coalesce. The problem of the statistical properties of these coalescences is very similar 
to our present study of genealogical trees. None of them has -to our knowledge- yet been analysed. 

In the present work, we study theoretically the problem of repetitions in the genealogical trees in the case of a closed, 
panmictic population. The study of the properties of a single tree with coalesccnt branches and the comparison of 
the genealogical trees of two contemporary individuals allows us to show that 

1. There is a finite fraction (about 20% for a population of constant size N) of the initial population whose 
descendants becomes extinct after a number of generations G c oc log TV. All the rest of the initial population 
(about 80%) belongs to all genealogical trees, 

2. The distribution of the repetitions of ancestors living more than G generations ago reaches a stationary shape 
after about G c generations, 

3. The genealogical trees of two individuals in the same population become identical after a small number of 
generations G c back into the past, 

4. The similarity between two genealogical trees changes from 1% (almost all ancestors in the two trees are different) 
to 99% (the repetitions of the ancestors in the two trees are almost identical) within 14 generations around G c , 
independently of the population size N. 

Our work can be generalized (see section IV) to describe coalescent processes, understood as the study of the 
gene tree originated when looking for the ancestry of a random sample of sequences (Kingman, 1982; Hudson, 1991; 
Donnelly & Tavare, 1995). In the absence of recombination, each sequence has a single ancestor. The topology of thus 
reconstructed trees is equivalent to that generated through branching processes. Next in complexity, one can consider 
a two-loci sequence and assume that recombination can occur only between the two loci and with a small probability 
(meaning correlated genealogies^] for the two loci) . The statistical properties of such process can be estimated until 
the most recent common ancestor (MRCA) is reached (Hudson, 1991). Instead, if one faces the study of a chromosome 
(Wiuf & Hcin, 1997; Derrida & Jung-Muller, 1999) or of the whole genome, the number of ancestors grows as one 
proceeds back in time, since each individual has two parents and, apart from coalescence, also recombination (meaning 
splitting of the branches in the tree) is frequent. 

If one considers a population or a sample of individuals within a population, there are relevant differences between 
the genealogy of a single gene and the genealogy of a chromosome or of the whole genome (which we study here). 
While in the first case, in fact, there exists a MRCA for the sample (where the gene tree ends), the genealogical 
tree of a chromosome or of the genome with two parents proceeds backwards in time and never reduces to a single 
ancestor. The genealogical tree representing the pedigree of a diploid organism contains a large fraction of the ancestral 
population. In this case, one may then talk about the most recent common set of ancestors, and study the similarities 
among different individuals now within the same population. 

II. STATISTICAL PROPERTIES OF AN INDIVIDUAL TREE 

Here we consider a simple neutral model of a closed population evolving under sexual reproduction and with non- 
overlapping generations.^] If the population size is N(G) at generation G in the past, we form couples at random 
(by randomly choosing N(G)/2 pairs of individuals) and assign each couple a random number k of descendants. The 
probability pk of the number k of offspring is given and if the population size is N at present, its size N(G) at 
generation G in the past is given by 



2 In this paper, we use the term genealogy to refer to the ancestry of a single gene or of a whole set of sequences. In all cases, 
the genealogy is the complete set of ancestors contributing to the present object, this object being an individual (as in section 
II), a group of individuals (as in section III), a sequence (section IV), or a single locus (as quoted here). In this case, correlated 
genealogies simply means that the different sets of ancestors for the two-loci are not independent. 

3 The Wright-Fisher model for allele frequencies works in the same set of hypothesis (Wright, 1931; Fisher, 1930). More 
recently, Serva and Peliti (Serva & Peliti, 1991) obtained a number of statistical results for the genetic distance between 
individuals in a sexual population evolving in the absence of natural selection. 
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N(G)={-) N (1) 
where the factor m is obtained from 

m = ^2 k p k . (2) 

k 

For m — 2, the population size remains constant in time, whereas for m =/= 2 the number of individuals in the next 
generation is multiplied by a factor m/2. After a number of generations, the tree of each of the individuals in the 
youngest generation is reconstructed. To quantify the contribution of each of the ancestors to the genealogical tree of 
an individual, we define the weight (G) of an ancestor 7 in the tree of individual a at generation G in the past as 



w 



( 1 a) (G+l) = \ Yl & 
7' children of 7 



We take ui^ (0) = S an , as this ensures that at generation G = all the weight is carried by the individual a itself. The 



factor 1/2 in (||) keeps the sum of the weights normalized ^7=1 w y \G) = 1, for any past generation G. The weight 



^N(G) (a). 

I -1/Z, 111 \kJ\J 1S.CCJJS L11C OU111 Ul 111C WClgllLO 11U1 UlclllZjCU /_ 

(G) can be thought of as the probability of reaching ancestor 7 if one climbs up the reconstructed genealogical 
tree of individual a by choosing at each generation one of the two parents at random. The weights essentially measure 
the repetitions (see figure 1) in the genealogical tree. Without repetitions, w^ a \G) would simply be 2~ G for each 
ancestor 7 in the tree. 

As an illustration of the previous quantities, we represent in Fig. 1 the result of random matings inside a small 
closed population of constant size N = 14 (thus m = 2) during 7 generations. The lines link progenitors with their 
offspring. The grey scale gives the weight u> 7 (G) of each of the individuals in the tree. The numbers on the left, all 
of them of the form r/2 G , give the weight of the leftmost individual in each generation. The denominators simply 
indicate the potential maximum number of ancestors at each generation. As counted by the numerator, each of them 
would appear repeated r times in this tree if all the branches were explicitly shown. 

We further assume that the probability pt- of having k children per couple follows a Poisson distribution, pf. — 
m k e~~ m /k\ (most of what follows could be easily extended to other choices of pk)- We represent in Fig. 2 the 
probability for an English couple to have k marrying sons during the period 1350-1986 (Dewdney, 1986). The solid 
line corresponds to a Poisson distribution with average 1.15 (i.e., the average number of offspring per individual in 
that period, which corresponds to to = 2.3 in our analysis), and implies that the total population is growing. These 
data spanning six centuries and taken over an homogeneous population support the hypothesis that the number of 
offspring is indeed Poisson distributed.^] 

If we define S^ a '(G), the fraction of the population (at a generation G in the past) which does not belong to the 
genealogical tree of individual a, (i.e. such that w^\G) — 0) one can show (see the appendix) that 

S* (q) (G + 1) =cx P [-to + to S {a) {G)] . (4) 

This recursion, together with the initial condition S^ a '(0) = 1 — 1/N, determines this quantity for any G (Derrida et 
al, 1999). 

For large G and for any individual a, this fraction S^ a \G) converges to the fixed point S(oo) of (^). This gives 
for m — 2 (i.e. for a population of constant size) a fraction S(oo) ~ 0.2031878.. which becomes extinct, so that the 
remaining fraction 1 — S^oo) ~ 80% of the population belongs to the genealogical tree of any individual a. A similar 
calculation shows that this 80% of the population which is not extinct after a large number of generations appears 
in the genealogical trees of all individuals: If S^ a '^'(G) is the fraction of the population which does not belong to 
any of the two trees of two distinct individuals a and /3, S^ a '^(G) satisfies the same recursion (|J) as S^ a \G), and 
converges to the same fixed value 5'(oo). Thus, within this neutral model, an individual either becomes extinct (with 
a probability of 20%) or becomes an ancestor of the whole population after a large number of generations (with a 
probability of 80%). For an exponentially growing population with to = 2.3 as in figure 2, the results are the same 
except for the precise value of S(oo) (for m = 2.3, one finds 5(oo) ~ 14%). 



4 Nonetheless, deviations from this distribution induced by a social transmission of the reproductive behaviour have been 
reported (Austerlitz & Heyer, 1998). 
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When G is large enough, as shown in the appendix, the whole distribution P(w) of the weights Wy(G) reaches a 
stationary shape, the properties of which can be calculated (Dcrrida et at, 1999). We show in Fig. 3 the distribution 
P(w/{w}) for different values of m. As can be seen, it has a power-law dependence, P(w) oc for small values of 
the ratio w/(w), with an exponent given by 

log 5(00) 

? = ] 2 , 5 

logm 

and achieves a maximum value for w/{w) ~ 1. 



III. SIMILARITY BETWEEN TWO TREES 



We would like to know how similar are the genealogical trees of two contemporary individuals and how they evolve 
in time within the same population. We have seen that a large fraction 1 — S(oo) ~ 80% of the ancestral population 
constitutes the pedigree of every present individual. As a next step, one can compare two individuals and compute 
the degree of similarity between their trees, that is, the set of ancestors appearing at each generation in both trees 
simultaneously. We will see in particular that the two trees become identical after a number G c of generations. 

We start with the definition of the overlap between the genealogical trees of two different individuals, a and (3. Let 
Wy(G) be the weight of the ancestor 7 in the tree of a at generation G in the past, and similarly let w^\G) be 
the weight of the same ancestor 7 at generation G for (3. These weights evolve according to (||) with (0) = <5 7jQ 

and (0) = 5 7il 8 at generation G — 0. In order to quantify the similarity between the two trees, we introduce the 
quantities 



N(G) 
7=1 

and 



N(G) 

y («./») (G) = J2 w ( y a) (G) w<f\G) . 
7=1 

Y( a 'P)(G) measures the correlation between the two trees at generation G in the past and X^ a \G) acts as a normal- 
ization factor. We then define the overlap q^ a, ^\G) between the two trees at that generation by 



fa.fl^- Y^\G) 

[IW(G) XW(G)] 



q ^)(G) = 



1/2 



This overlap is a measure of the (cosine of the) angle between the two N— dimensional vectors w\ a \G) and m!y^(G).p] 
When q( a '^(G) ~ 0, the two vectors are essentially orthogonal and the ancestors of a and (3 are all different. On the 
other hand, when q^ a '^{G) ~ 1, the vectors are almost identical (as for brothers). 

For a large enough population, the fluctuations of X^ a '(G) and Y^ a '^'(G) are small around the population averaged 
values (X(G)) and (Y (G)) for almost all choices of a and (3. Of course, if a and (3 are brothers, Y^(G) = X^ a \G), 
a value very different from its average (Y(G)}; it is however very unlikely to get brothers, sisters or even cousins if 
one picks up two individuals at random from a large population. 

The averages {X(G)) and (Y(G)) can be calculated from the evolution of the weights (^). Initially, X(0) — 1 and 
Y(0) = since the individuals a and (3 in any pair are different. Using the fact that for large N the fluctuations of 



5 Similar quantities have been proposed as an indicator of the amount of evolutionary divergence between populations (Kimura, 
1983). The quantity analogous to our weight in the population genetics approach is the frequency of the sampled alleles, 
the number of ancestors 7 corresponds to the number of genes (that is the dimension of the space in which the vector w^' is 
embedded), and our individuals a and (3 correspond to the compared populations (Cavalli-Sforza & Conterio, 1960). 
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X^ a \G) and Y^ a ^\G) are small, the expected value of the overlap q(G) between two randomly chosen individuals 
is given by 

(r s (Y(G)) 1 

^ * JXW) = l + mG.-G (6) 

where 

logm 

This expression is derived in the appendix. Of course Eq (^) is only valid with probability one with respect to the 
random choice of a and (3 and with respect to the dynamics. We see that for large N, the overlap q(G) is essentially 
zero for a number of generations of order G c ~ log N / log m and then within a number of generations AG which does 
not depend on N , it becomes equal to unity. Fig. 4 displays the averaged overlap q(G) as a function of the number 
of generations G for different values of N. We have chosen m — 2 so that the population remains constant in size. 
We see that changing N does not change the G dependence except for a translation of the curve. In particular the 
range AG on which the overlap changes from 1% to 99% does not depend on N. It is easy to check from (||) that for 
ttl — 2, the overlap should satisfy 

1 + q[G) 

(plain line in the insert). The fixed point q(G) — is unstable for this map. All the trajectories finally converge to the 
stable fixed point q(G) — 1 for large G. Also the quantity AG can be estimated by counting how many generations 
are required for the overlap to change from 1 % to 99 % and this gives from (0) 



AG~log(10 4 )/logm, 

that is AG ~ 14 for m = 2 and AG ~ 11 for m = 2.3 as in figure 2. Typical values of G c are G c ~ 20 for a population 
of constant size N = 10 6 . For a population increasing with m — 2.3 as in figure 2, one gets G c = 21 if the size in the 
last generation is N = N(0) = 75 millions. 

The previous analysis can be easily extended to the hypothetical case of having an arbitrary number p of parents 
instead of 2. As is shown in the appendix, the statistical properties of genealogical trees in a population of constant 
size but arbitrary p are the same as for a population with only two parents and an expanding or shrinking size 
according to Eq. (Fy) . The described statistical properties are thus equivalent in (i) a system with sexual reproduction 
and a growth rate m = p and (ii) a system with constant population size but a number m of genders. 

The existence of a generation G c around which the genealogical similarity among individuals changes from to 1 
and which grows logarithmically with the size of the population is one of our main results. This has to be compared 
with the number of generations required for the population to become genetically homogeneous (Donnelly & Tavare, 
1991; Harpending et ai, 1998), which grows proportionally to N. The difference is that when G c <C G <C N, all the 
overlaps are 1, i.e all the genealogical trees in the population have the same ancestors with the same weights, but 
the genomes are still very different: This is just an extension of the situation of brothers who have exactly the same 
genealogical tree but different genomes. 



IV. SIMPLE MODEL FOR THE CONTRIBUTION OF THE ANCESTORS TO THE GENOME 

The evolution of a set of sequences subject to coalescence and recombination was first described by Hudson (1983). 
In this case, evolution proceeds until the most recent common ancestor for each set of homologous sites has been 
found. The set of MRCA sites does not necessarily belong to the genome of a single ancestor, on the contrary, it is 
in general spread on a finite fraction of the original population (Wiuf & Hein, 1997; 1999). In this section, we focus 
our attention on the statistical properties of the ancestry of a single extant genome. In particular, we calculate the 
equilibrium distribution for the fraction of material contributed by each ancestor. 

Consider the whole set of genes that a present diploid organism has inherited from its parents. Although both parents 
contributed 50% each, it is no longer true that grandparents contributed 25% each, since independent assortment 
of chromosomes plus crossing over mixed in each of the parental gametes the material inherited from the previous 
generation. As a rough approximation to the output of genetic recombination, one might consider that each sequence 
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is obtained as the addition of a fraction / of the genetic material of one parent and a fraction 1 — / of the genetic 
material of the other parent with / £ (0, 1). This would be true if the length L of the sequence was long enough (or 
infinitely long), so that there would be no restriction on the number of times it could be divided, and if one could forget 
the linear structure of the sequence. The process of coalescence and recombination (for small N) is schematically 
represented in Fig. 5. 

We can now repeat the analysis done previously to the present extension. We will discard the correlations between 
the values of / coming from a couple. This is equivalent to our assumption that fixing the pairs for k offspring or 
choosing the parents of each individual at random only has effects of order 0(N^ 1 ) (see the appendix), and we can 
therefore work in the simplest realization of the process. Hence, we assume that the fraction / takes independent 
values for each parent. The recursive equations (|3j) for the weights become 

w^(G+l)= J2 / 7 ' w^\G) , (9) 

7' children of 7 

where the weight (G) means now the fraction of the genetic material of individual a inherited from ancestor 7 at 
generation G. The random fraction / is chosen anew for each offspring from a distribution p(f) (with average value 
(/) = 1/2). This implies that now even brothers would have different weights for their ancestors, and hence brings us 
slightly closer to the real genetic process. 

Following the procedure described in the appendix, one can calculate the fraction S of ancestors without lines of 
descent in the present (as we also show in Sec. II) and the exponent £ for the distribution P(w). In general, given 
the distribution p(f) for the contributions of the parents, we get 

S(oo) = e mS{oo) - m (10) 



1 = S(oo) m' + * (/)^ J /-*- A p(/)d/ ■ (11) 

as one can easily show from (g) that the generating function ha(X) defined by /jg(A) = (exp[Xw(G) / (w(G))]) has a 
limit /loo (A) for large G which satisfies 



hoc (A) = cxp 



+ m / p(f)dfh 



A/ 

Mf) 



Fig. 6 summarizes the changes in the distribution P(w) for different distributions p(f) of the random variable /. 
We have considered a simple case of a population of constant size (i.e. m = 2) and with p(f) = 1/(25) uniform in the 
interval (1/2 — 5, 1/2 + 5). In this particular case, an implicit relation between 5 and the exponent £ can be obtained, 



5£ = S 



1 



-5 



(12) 



As 5 varies, P(w) remains a power law at small w (i.e. P(w) oc vfi), and the exponent £ monotonously decreases 
with 5. In particular, for 5 ~ 0.35, £ changes sign: The maximum of P(w) moves discontinuously from w/(w) ~ 1 to 
w/{w) ~ 0. The exponents obtained through simulations of the process are represented in Fig. 7 together with the 
numerical solution of Eq. (|l2|), showing a good agreement. 



V. DISCUSSION 



We have analysed the statistical properties of genealogical trees generated inside a closed sexual population. Wc 
focused our interest on the distribution of the repetitions of ancestors in the trees and on the amount of genetic 
material contributing to an extant genome. The precise values of £, S(oo), G c and AG depend only weakly on the 
details of the model and do not change qualitatively if for instance a non Poissonian distribution of offspring is used. 
Moreover, we have shown how our results can be extended to the hypothetical case of having an arbitrary number p 
of parents: Indeed, this case proves to be equivalent to a biparental population with a growth rate m/2 = p/2. 

The problem analysed here presents a number of connections to other fields. Equations similar to (|3|) appear also 
in the distribution of constraints in granular media where the variables w represent the force acting on each grain and 



G 



the recursion (||) expresses the way in which constraints are transmitted from one layer to the next (Coppersmith et 
at, 1996). In this case, p ^ 2 and even fluctuating p would be perfectly realistic. The fact that the overlap changes 
from to 1 within a small number of generations AG independent of the size of the population and after G c ~ log N 
generations is also very reminiscent of the sharp cutoff phenomenon characteristic of some natural mixing processes 
modelled by Markov chains. One example of such systems is the shuffling of cards, where the stationary state in 
which the system has lost almost all information about the initial ordering of the n cards is reached through a sharp 
cutoff after about logn riffle shuffles (Diaconis, 1996). 

It is clear that the study of the interplay between the weights calculated in our generalized model and the structure 
of the genome would require more sophisticated approaches (Derrida & Jung-Muller, 1999; Wiuf & Hein, 1997; 
1999). We have discarded the correlations between the history of neighboring sites in a sequence and assumed the 
independence of the factors /. Actually, the closer in the sequence two positions are, the more correlated are their 
genealogical histories (Kaplan & Hudson, 1985). This fact constrains the possible breaking points for our simulated 
sequences, implying that the random factors / in (^|) are a crude approximation to reality. 

Since we have faced the problem from a statistical perspective, our results represent the average, typical behaviour, 
and are only valid with probability one when the population size is large. We did not study fluctuations due to the 
finite size of the population. Nonetheless, we hope that our results contribute to a better understanding of the role 
of genealogy in the degree of diversity of finite-size interbreeding populations. 



APPENDIX 

In this appendix we have regrouped the technical aspects of the derivations of the main equations (0,0,^,0,0) 
presented in the body of the paper. 

One may consider several variants of the model which all give a Poisson distribution for the number of offspring 
when the size of the population is large. For instance, the population size could be strictly multiplied by a factor m/2 
at each generation or it could fluctuate (if we take the number of offspring from the Poisson distribution) . One might 
decide that each individual has two parents chosen at random in the previous generation or form fixed couples and 
assign each couple some children. All these variants do not change the results when the population size is large, but 
might affect some finite size corrections that we compute in this appendix. 

We will choose the following version of the model, which makes the calculation of the finite size corrections not too 
difficult. Our population has a given size N(G) at each generation G in the past, and we will assume that all the 
N(G) are very large, at least in the range of generations G that we will consider. Now, to construct the ancestors of 
all the N(G) individuals at generation G in the past, we choose for each of them a pair of parents at random among 
the N(G + 1) individuals at the previous generation (to facilitate the calculation, we do not even exclude that the two 
parents might be equal). Within this model, the number k of children of an individual at generation G + 1 is random 
and can be written as 

2N(G) 

k= ^ Zi 

i=l 

where Zi — 1 with probability 1/N(G + 1) and Zi — otherwise. It follows that the whole distribution of k can be 
calculated. The probability pk that an individual at generation (G + 1) has exactly k children is given by the binomial 
distribution 

mm ( i \ k ( i \ 2N(G) - k ( . 

Pk k\ {2N{G)-k)\ \N{G+l)) \ N{G + l)) ' 1 ' 

In particular, 

2A^(G) 



(k) = 



<*(*-!)> 



N(G + 1) 

2N(G)[2N(G) 



N(G + 1) 



2 



m -d(*- 2)> = mm^mm^A . (14) 

If the population size is multiplied by a factor m/2 at each generation, i.e. if N(G) = N(G + 1) m/2 (as G counts 
the number of generations in the past), one recovers from ( |l3| ) the Poisson distribution pk = m k e~ m /k\ for large 
N(G). 
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A. Calculation of the density of individuals without long term descendants and derivation of (^|) 

To establish (@), one simply needs to notice that for an individual to have no descendants after G + 1 generations, 
all his children should have no descendants after G generations. Let M(G) be the number of individuals with no 
descendants at generation G in the past. Given M(G), one can write M(G + 1) as 



N(G+1) 

M(G+1) = ]T Vl 

7=1 

where y 1 = 1 if all the children of 7 are among the M(G) and y 7 = otherwise. It can be shown that 

2N(G)-2M(G) 



N{G+1) 



and 



for 7^7'. This gives 



(2/72/7') = 1 - 



N(G+1) 



2N(G)-2M(G) 



(M(G+l)) = N(G + l)(l 



N(G+1) 



2N(G)-2M(G) 



(15) 



(M 2 (G + 1)} = (M(G + 1)} + N{G + 1)[N{G + 1) - 1] 1 - 



N(G+1) 



2N(G)-2M(G) 



(16) 



When all the M's and TV's are large, we see from ( p^|Jl^ ) that the fluctuations of M(G + 1) are small (as (M 2 (G 
1)) - (M(G + l)) 2 < (M(G + l)) 2 ), and one finds from (|f) that the ratio M{G)/N{G) = S^(G) satisfies 



S* (Q) (G + 1) = exp 
which is identical to (Eh for N(G) = N(G + l)m/2. 



B. Time evolution of the distribution of the weights 

From the recursion (||) and from the known distribution ( |l3| ) of k one can write recursions for the moments of the 
weights 



(^>(G + 1)} = ^W(G)> 



(K")(G + I)] 2 ) = { 4(U«\G)f) + {K \ 1}) K")(G) w y(G)) , 



(17) 
(18) 



where 7 ^ 7'. The normalization ^ w 7 "' = 1 allows one to rewrite 



(w 7 q) (gW? ) (g)} = 



JV(G) - 1 

and together with the known moments (|IJ) gives that 



(w^(G))-([w^(G)] 2 ) 
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{w™ (G + l)) = 



'4 Q )(G + 1)] 2 ) = 



N(G) 
N(G + l)" 
N(G) 



(G)) = 



1 



N(G+1) 
N(G) [2N(G) 



2N(G+1) 2 N{G+ l) 2 [N(G) - 1] 
2iV(G) - 1 



([^(G)] 2 } 



2 7V(G + 1) 2 [iV(G) - 1] 



(19) 



(20) 



where 7^7'. 

For large N(G), if the ratio N(G + l)/N(G) = 2/to, as in the case of a population increasing by a factor m/2 at 
each new generation, expression (|20|) becomes simpler and one gets 



<[<>(G + 1)] 2 ) 



-(K Q) (G)] 2 ) 



/ 1 



4 V^(G) 



(21) 



In this limit, we have from ( |l4| ) that (k) = to and (k(k — 1)) = to 2 , and we see that ( f2~l| ) means that in (|T^) the 



weights w^ Q ' ) and 



(a) 



are, for large N(G), uncorrclatcd. The calculation of higher moments of the weights can be 
done in the same manner and for large N(G) the weights of different ancestors become again uncorrelated. 

If the population size changes in time, the distribution of the weights cannot be stationary. This is already visible 
in the expression ( [l7|) which shows that even the first moment of the weights changes with G. One can however check 
from (0) and © that the ratio {[w { 7 a) (G)} 2 ) / (w^ (G)) 2 which satisfies 



([^(G + l)] 2 ) 
(^ Q) (G + 1)) 2 



» 



(G)] 2 ) 



(4 Q) (G)) 2 



(22) 



has a limit to/ (to — 1) as G increases. Moreover, as the initial value of this ratio is N(0), the number of generations 
G c to converge to this limit is G c ~ log iV(0)/lo gm . Higher moments of the weights behave in a similar way and one 
can write recursions for ratios which generalize (E21) and which show that all the ratios have limits. 

This indicates that the distribution of the ratio w/ (w) becomes stationary. In the limit of large N(G) (considering 
that the weights of the different children 7' in (0) can be taken as independent and that the distribution of k becomes 
Poissonian), one finds that the generating function hc(X) defined by 



MA) 



exp 



,(<*) 



(G) 



(w(G)} 



satisfies 



(A) = E 



Pk 



2HG + 1)) 



= exp [—to + to /ig(A/to)] 



(23) 



(24) 



Recursion (|24|) generalizes to the case to ^ 2 (i.e. the case of an exponentially increasing population) the result of our 
previous work obtained for a population of constant size (to — 2). Similar recursions have been studied in the theory 
of branching processes (Harris, 1963). The use of generating functions in population genetics is well illustrated in the 
book by Gale (1990), where this method is for example applied to the calculation of the probability of fixation of a 
mutant allele. 

It is remarkable, that if one considers an imaginary world where each individual would have p parents (instead of 2), 
the generating function (^3|) , in the case of a population of constant size, would satisfy the recursion (|24|) with to = p. 
This means that as long as the distribution of weights is concerned, the problem of a large population of constant 
size with to parents per individual is identical to the problem of a population of size increasing at each generation by 
a factor m/2 with two parents per individual. 



C. Stationary distribution 



For large G, if we fix the ratio N(G)/N(G + 1) = to/2, the generating function /ig(A) converges to /ioo(A) solution 

of 
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hoc (A) = exp [—to + to /loo (A/to) 
If one expands the solution around A = 0, one finds that 



(25) 



/ioo(A) = 1 + A+- 



1 TO 



1 to 2 (to + 2) 3 



2 to — 1 6 (to 2 — 1)(to — 1) 



A^ + 



(u> 3 ) m 2 (m + 2) 



and the comparison with (|23j) gives for large G 

(lV 2 ) TO 

(u>) 2 to — 1 ' (w) 3 (m 2 — l)(m— 1) ' 

which means that in principle the whole shape of P(w) can be extracted from (|25|). In particular, one can predict the 
power law of P(w) at small w. Trying to solve (J25T) for large negative A, if one writes 



froo(A) - 5(oo) 



IAK+ 1 



(26) 



one finds, as expected, that S(oo) is the fixed point of (Q). Eq. ( p^ ) is equivalent to the asumption that P(w) ~ 
at small w, where the exponent £ should satisfy 

1 = S(oo)m«+ 2 . 

This completes the derivation of (|J) which was already discussed in our previous work (Derrida et ai, 1999). 



D. Overlap between two trees 

Let us now show how @ can be derived. Starting from recursion one obtains by averaging over all the links 
relating generation G to generation G + 1 



«>(G + 1)«,W>(G+ 1)) = M^) (G >f (G)> + (w («) (G)w (f) (G)) , 

where 7^7' and the averages over fc are carried out with respect to (^). This gives 

{wW{G + l)wW{G + l)) = ^(^)(G)<)(G)) + \ (to 2 - j^l) ) (w^ (G)w^ ) (G)) . 
Using the fact that the sum J^V, tir, (G) — 1, so that (w^ (G)) = l/N(G) at all generations, one gets that 



<^)(G + l)^f (G + l)) = ^(™(*)(G>f (G)> 



1 

+- I to" 



m \ ivk -(w^ a \G)w^\G)) 



N(G + 1)J N(G)-1 
Keeping only the dominant contributions for large JV's we arrive at 



(27) 



(28) 



(29) 



(w^(G + 1)<)(G + 1)> = ™(f(G)<)(G)) + 



Comparing this expression with (]2j]), one sees that for large N, one could have simply neglected the correlations 
between the weig hts of different individuals, (i.e. directly replaced (w^ (G)w i ^ ) (G)) by (w<f \G)) (w^ (G))) and 



(30) 



used the Poisson distribution instead of (|13[)). The previous recursion can be integrated 
<^>(G)<)(G)> = [<<)(0Vf (0)) + ^^(™ G - 1)] (-)' 
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and using the fact that (w\ a) (G)w\ p> (G)) is equal to Y{G)/N{G) when a ^ (3 and to X{G)/N{G) when a = ft one 
finds (with X(0) = 1 and F(0) = 0) 



(F(G)) _ (m G - l)m- G < 
(AT(G)) ~~ 1 + (m G - l)m- G - 

where G c is given by (0). For large AT, that is for large G c this reduces to (||) in the whole range where the expression 
departs from or 1, that is for G of order G c . Finally, one can check that with the value of G c given by (^), N(G) is 
always large, as long as N is large, so that our assumption that all the iV's were large was legitimate. 
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FIG. 1. Coalescence of branches in a genealogical tree. We display the reconstructed ancestry of a present individual in a 
small population of constant size -/V = 14. Numbers on the left side stand for the weight w of the leftmost individual at each 
generation. The grey scale changes from light grey (small w) to dark grey (large w) proportionally to the logarithm of the 
weight. The exact values are calculated according to Eq. (2). The weight is a measure proportional to the number of times 
that an ancestor appears in a tree, or, equivalently, to the number of branches which have coalesced up to that point. 




0.0 2.0 4.0 6.0 

Number of males 

FIG. 2. Probability for an English couple to have k marrying sons during the period 1350-1986 (open circles). The solid line 
corresponds to a Poisson distribution of average 1.15. (Data from Dewdney (1986)). 
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w/<w> 



FIG. 3. Stationary shape of the distribution P(w/(w)) for different values of m. We compare the constant population case 
(m = 2) with shrinking (m = 1.5), and expanding (m = 3, 4) populations. Parameters are N — 4096, G — 20, and averages 
over 10 independent realizations have been performed. 




FIG. 4. The averaged overlap q(G) as a function of the number of generations G. The results of simulations for different 
sizes of the population N = 100 (open circles), 1000 (solid squares), 10000 (open diamonds), 100000 (solid triangles) agree with 
this prediction, up to small finite size corrections only visible for iV = 100. The insert shows the results of simulations and the 
prediction (^|). 
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FIG. 5. Representation of the first 5 generations of the tree in Fig. 1 with a random distribution of the weight of an individual 
between its two parents. The fraction / of the weight contributed by each ancestor is randomly chosen from a distribution 
with average value {/) = 1/2. 
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FIG. 7. Comparison between the predicted values of the exponent £ (solid line) given by ([li]) and the results of the simulations 
for different values of S (circles). Parameters as in Fig. 6. For a value of S ~ 0.35, the exponent £ changes sign. This point is 
important since the typical contribution of a randomly chosen ancestor changes suddenly in a finite amount. 
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